Jf-Scaffold subgraphs of Complex networks 
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Complex networks with high numbers of nodes or links are often difficult to analyse. However, 
not all elements contribute equally to their structural patterns. A small number of elements (the 
hubs) seem to play a particularly relevant role in organizing the overall structure around them. But 
other parts of the architecture (such as hub-hub connecting elements) are also important. In this 
letter we present a new type of substructure, to be named the if-scaffold subgraph, able to capture 
all the essential network components. Their key features, including the so called critical scaffold 
graph, are analytically derived. 



Introduction. Networks pervade complexity P, HI Hl> HI • 
How networks are organized at different scales is one of 
the main topics of complex network research [j| H, 0, H, 
[H, [l(| ■ Some approaches are based on the study of given 
subgraphs, from the smaller network motifs 0, d, [Tl| to 
if -cores [y, [l2[ , spanning trees or gradient subgraphs 
obtained form a given internal system's dynamics |9|. 

One of the most studied subgraphs is the so-called K- 
core, formally defined by Bollobas in [l3| . The if -core of 
a graph Q, Ck(G) is the largest subgraph whose vertices 
have, at least, degree k > if. The behaviour of such 
subgraph, and its percolation properties have been widely 
studied [g, [TH, 0, [H, [HI], if -cores display interesting 
features with several implications in the study of real 
networks, both at the theoretical and applied level [U, 

Hubs are the center of attention of the if -core. They are 
responsible for the efficient communication among net- 
work units and their failure or removal can have dramatic 
consequences [T^]. But other graph components are also 
relevant to understand network behavior. In particular, 
hubs are often related through other elements exhibit- 
ing low connectivity, the so-called conectors. Despite its 
relevance, the if -core fails in finding the hub-connector 
structure. This pattern is essential in highly dissassorta- 
tive or modular networks, where hub-hub conectors play 
a crucial role Q. In such networks, robustness against 
failures is strongly tied to hubs, but also to the hub- 
conector structure. Moreover, conectors can display high 
betweness centrality Q despite their low conectivity, re- 
inforcing the role of this kind of nodes in non-local orga- 
nization of the global topology and dynamics. 

To overcome these limitations, we introduce a subgraph 
definition which captures the previous traits. Specifi- 
cally, we consider a subgraph that includes the most con- 
nected nodes and their connectors, if any. In doing so, we 
want to explore wether there is some fundamental hub- 
connector subgraph and its relevant properties. Such a 
graph, the so-called if -scaffold subgraph, was recently 
introduced (in qualitative terms) within the context of 
the human proteome [l9l ]. This network included only 
transcription factors, i. e. proteins linking to DNA and 
thus involved in regulating gene expression (fig 1(a)). 
Specifically, it was shown that an appropiate choice of 



relevant hubs and their connectors allowed to define a 
functionally meaningful subgraph. Such subgraph con- 
tained a large number of cancer-related proteins around 
which well-defined modules were organized as evolution- 
arily and functionally related subsets. Here, we define 
this subgraph in a rigorous way. We analitically charac- 
terize its properties and degree distributions as well as 
the presence of a special class of minimal scaffold graph 
based on a critical percolation threshold. 

if -Scaffold subgraphs. Let us consider a graph G(V, T), 
where V is the set of nodes and T the set of edges connect- 
ing them. The if -Scaffold subgraph Sk{G) will consider 
the degree of nodes fc(ej), ej £ E but it will take into 
account correlations: Specifically, if we choose a node 
ei £ V, it will belong to Sk{G) if and only if (1) if < fcj 
or (2) Ci is connected to ej £ V, and ej is such that 
if < kj. Thus, given a graph G{V,T), with its adjacency 
matrix a™, the if -scaffold of G will be defined as: 



Si 



a-ij iff (if < h V if < kj) 
otherwise 



(1) 



An example of such if -scaffold subgraph is shown in 
(fig. (lb)). This allows us to define, from a given graph G 
a nested hierarchy of subgraphs Sk{G) such that: 



.Sk + i(G)QSk{G)QS K -i(G). 



(2) 



To clarify which elements are really relevant, we also de- 
fine a naked if -scaffold subgraph. From the if -scaffold 
subgraph, Sk(G), the naked if -scaffold, Jk(G), is ob- 
tained by removing all nodes having a single link (the 
"hair" of the graph) (fig. (lc)). Thus, from Sy, it is easy 
to compute the adjacency matrix of the naked if -scaffold 
subgraph, namely: 



7y = S K {a t j){\ - <Sfc,,i)(l - 



(3) 



Additionally, if two or more connectors have identical 
pattern of conectivity in Jk(G) (i-e., they are connected 
to exactly the same hubs, understanding hubs as nodes 
with k > if), we renormalize these sets of connectors by 
replacing each of them with a single node. In this way, 
the renormalized if -scaffold subgraph, 7 Jk(G), keeps the 
relevant elements without redundancies (fig. (Id)). 
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FIG. 1: (a) The Human Transcription Factor interaction Network (HTFN). (b) its 11-Scaffold subgraph, (c) The naked 11- 
scaffold subgraph and (d) the naked and renormalized 11-scaffold subgraph. The A'-scaffold subgraph displays a fundamental 
hub-connector structure that organizes the general topology of the whole system. Data from [l9^ . 



Statistical Properties. Here we derive the main statistical 
features of the AT-scaffold subgraph from an arbitrary, 
uncorrelated network Q . First, we compute the fraction 
of nodes in Q belonging to Sk(G), i.e., the probability for 
a random choosen node of Q to belong to Sk{Q)- If wc 
define 

k<K x ly 

We can define / as: 

/ = 1 - £ P(k) (q <K ) k (5) 

k<K 

Where we have to read the second term of equation ([5]) 
as the probability to find a node with k < K such that all 
of its k first neighbours have k! < K. 
We could consider or not links connecting connectors, i.e., 
nodes with k < K but connected to nodes with k > K . In 
order to simplify algorithmic procedures, it is reasonable 



to avoid such connector-connector links. If we avoid these 
class of links, the probability for a randomly choosen link 
to belong to the if-scaffold is, simply: 

h = 1 - (q <K ) 2 (6) 

However, for mathematical consistency, it is necessary to 
take into account such kind of links. If we do so, the 
probability for a randomly choosen link to belong to the 
iT-scaffold is: 




To complete our characterization, we find the degree dis- 
tribution of Sk{Q)- To do the job, we need to define the 
probability that a node whith degree less than K is con- 
nected to exactly k nodes whose connectivity is equal or 
higher than K, namely: 

g (k,K)= nm-q<K) k {q<K) i - k (8) 

k<i<K 
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The probability distribution for fc's above K in the K- 
scaffold is the same probability distribution of the sub- 
strate graph Q, multiplied by a normalization factor /. 
To see this, we can see that, from the definition of g(k, K) 
we have: 



k<K 



g{k ,K) + ±P { k)J^fl = f 

k>K ' ' S 



(9) 



Thus, the normalized degree distribution of Sk{Q) will 
be: 



Ps K (k) 



g(k,K)/f, iSk<K 
P(k)/f, otherwise 



(10) 



Minimal K -Scaffold subgraphs. The previous definitions 
refer to A-dependent subgraphs, being A arbitrary. But 
we can ask if some specially relevant A value is involved. 
In otherwords, since larger A values support smaller scaf- 
fold graphs, we might ask what is the limit in this process 
and what is keeped before the network is fragmented or 
too small. The question we are addressing concerns the 
existence of a characteristic scale. The minimal scaffold 
subgraph will label the minimal substructure capturing 
the fundamental hub-connector architecture of the net, if 
it exists. This subcritical subgraph will be located immc- 
diatly above of the percolation threshold of Q , considering 
how the A-scaffold performs node deletion. Following the 
configuration model [201 ] . we consider very large random 
graphs with an arbitrary degree distribution [2l|. Thus, 
given a graph Q we compute Sk (G) by increasing A un- 
til it breaks down into many unconnected components. 
The probability for a node to belong to the A-scaffold 
subgraph will be a function of its connectivity k. We will 
refer to this function as /&: 



fk 



1, iff k > A 

1 — {q<K) k , otherwise 



(11) 



Clearly, a node with degree k < K has a probability 
1 — (q<K) k to be connected with at least to one node 
with degree higher than A. Now we define the generat- 
ing functions [g, HH, [H, [23[ , taking in account that to 
compute the A-scaffold implies deleting a given fraction 
nodes (23|, namely: 



F (x) =^P(fc)/ fc a; fc 

k 

Let us define Fi(x) as: 

F^)='±kP { k)f k x^- 1 ^ 

(k)g ^ 



(k)g dx 



(12) 



(13) 



Using previous theoretical results [2l|, |22j, |23| , we com- 
pute the generating functions for the probability distri- 
bution of component sizes other than the giant compo- 
nent, if any. H\(x) will be defined as the generating 



function for the probability that one end of a randomly 
choosen edge on the network Q -when computing its A- 
scaffold- leads to a connected component of a given num- 
ber of nodes. This includes the probability that such 
component will contain zero nodes, because of the dele- 
tion of nodes of Q when computing the A-scaffold. As 
we discussed above, this will happen with a probability 
1 — / = 1 — Ai(l). The end of the edge can be occupied 
by a node with k outgoing edges, distributed along i*\ (x) 
23 1 . Thus, it leads us to the self-consistency equation 
21, ^2, -for a clear and detailed derivation of these 



results, see 



H^x) = l-F 1 (l)+xF 1 (H 1 (x)) 



(14) 



And the generating function for the size of the component 
to which a random choosen node belongs will be [2l|, [22|, 



H (x) = l-F (l)+xF {H 1 (x)) 



(15) 



The size of the giant component Sk that we will further 
identify with the if -scaffold subgraph will be 



S K = F (l) - F {v) 



(16) 



where v is the first non trivial, physically relevant solu- 
tion of 



v = 1-Fi(l) + 



(17) 



We can now look for a singularity in the average size of 
components. Immediately above of this point we define 
the minimal Sk{G)- Knowing that 



dH (x) 



dx 



(s) 



we find, after some algebra [2JJ, |22j, |2 



(*>=F (1) + 



dF (x) 



dx 



x — 

x=l 1 



^i(l) 



dF 1 {x) 



dx 



(18) 



(19) 



x=l 



With a singularity when [dFi(x) / dx] x= \ = 1. Now using: 



dFi(x) 



dx 



k(k-l)f k P(k) (20) 



we can derive the percolation condition for a if-scaffold 
graph from a given substrate graph Q(V,T) with given 
average conectivity (k)g and degree distribution P{k). 
We compute such condition taking in account the above 
critical condition ([2TJ]) . (Recall that computations are 
performed considering the successive pruning of Q by in- 
creasing A"). Knowing that: 



E 



kp(k) 



i 



(21) 
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FIG. 2: Numerical simulations of the relative size of the giant component Sk of the if-scaffold subgraphs of model degree 
distributions against K. Sk is computed as Sk = -Fo(O) — Fq(v), where v is the numerical approach of the first non-trivial, 
physically relevant solution of the the equation v — 1 — Fi (1) + F\ (v) (Sec text). The plots display Sk for (a) Erdos -Renyi 
graph, whith (k) = 15. (b) A Scale-free network with P(k) oc k~ a , a = 2.3. No K c can be properly identified (see text). The 
presence of the cut-off can be due to the finite size effects of the simulation; (c) Exponential net with P(k) oc e~ k/llc , K, = 13 
(d) Power Law with exponential cut off net with P(k) oc k~ a e~ k / KL , for a — 2.5 and K, = 52. 



and since equation (|20f is equivalent to: 

oo 

fc ( fc /fe - a- - x ) p ( fc ) = ( 22 ) 

k 

the percolation condition for a if -scaffold, Sk, is: 

k ( k f* - a- - x ) p ( fc ) > ( 23 ) 

fe 

We can easily see such a condition as the extension of the 
Molloy and Reed criteria [25| for the if-scaffold, Sk(G)'- 

oo 

£ k(k - 2)P(k) l)q k <K P{k) (24) 

k k<K 

Note that the right-hand side of the equation is al- 
ways finite, whereas the left-hand side could not be finite. 
Indeed, note that q < K < 1, thus, 

(q< K ) k < 1 (25) 

Futhermore, P(k) < 1. Thus, the sum 

Y, Hk ~ l)q< K P{k) < oo (26) 

k<K 

is always finite provided that k is bounded. It is straigh- 
forward that: 

oo 

Y k (k-2)P(k) = (k 2 ) -2{k) (27) 

k 

Thus, a finite K for the critical scaffold will exist if and 
only if the degree distribution of the network has a fi- 
nite second moment (k 2 ). An Erdos- Renyi graph, for 
example, will display a critical if-scaffold, provided that 
{k 2 ) er = {k) 2 - But for arbitrary large scale-free net- 
works with realistic exponents (2 < a < 3), we find that 
there is not such minimal if -scaffold. This is due to 
the divergence of the second moment. Thus, condition 



(l24l) always applies for all if's. Implications should be 
studied, provided that it implies that the hub-conector 
structure appears at all scales. 

Numerical simulations (Fig. ©) show that if we intro- 
duce a cut-off in the degree distribution, a characteristic 
scale K c is present. In E-R graphs, the size of the K- 
scaffold displays an abrupt decay beyond (k). Finally, 
the size of Sk{Q) displays a critical K in exponential 
networks. Note that by its definition, if Sk{G) perco- 
lates, also do the corresponding naked and renormalized 
counterparts {^k(G) and 1 Yk(G))- 

Discussion. if-Scaffold subgraphs can be easily mea- 
sured on any arbitrary network and can be useful to de- 
tect both key elements and the topological components 
that glue them. If topological organization is linked with 
functionality, particularly in relation to hubs, the scaf- 
fold of a complex network should be able to capture the 
relevant subsystem. For the human transcription factor 
network it was found that for K = 11, a small set 
of proteins having relevant cellular functions (including 
oncogenes, tumor supressor genes and the TATA-binding 
protein) was obtained, being all of them related through 
intermediate connector proteins. This is in agreement 
with the dissasortative character of celullar networks. 
Since each hub was associated to a group of functionally 
related TFs, the connectors were actually relating differ- 
ent parts of the protein machinery. Other real systems 
have also been analysed and provided further confirma- 
tion of the relevance of the scaffold approach (Corominas 
Murtra et al, in preparation). Further extensions and 
properties of this subgraph, togheter with te analysis of 
finite size effects associated to real systems will be pre- 
sented elsewhere. 
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